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Abstract: The topic of this paper is a novel Bayesian continuous-basis field 
representation and inference framework. Within this paper several problems 
are solved: The maximally informative inference of continuous-basis fields, 
that is where the basis for the field is itself a continuous object and not repre- 
sentable in a finite manner; the tradeoff between accuracy of representation 
in terms of information learned, and memory or storage capacity in bits; 
the approximation of probability distributions so that a maximal amount of 
information about the object being inferred is preserved; an information the- 
oretic justification for multigrid methodology. The maximally informative 
field inference framework is described in full generality and denoted the Gen- 
eralized Kalman Filter. The Generalized Kalman Filter allows the update of 
field knowledge from previous knowledge at any scale, and new data, to new 
knowledge at any other scale. An application example instance, the inference 
of continuous surfaces from measurements (for example, camera image data), 
is presented. 
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1 Overview 

The paper begins by reviewing traditional approaches to surface representa- 
tion and inference. Then the new field representation and inference paradigm 
is introduced within the context of maximally informative (MI) inference , 
early ideas appearing in The knowledge representation distribution is 
introduced and discussed in the context of MI inference. Then, using the MI 
inference approach, the here-named Generalized Kalman Filter (GKF) equa- 
tions are derived for a specific example instance of inferring a surface height 
field. The GKF equations motivate a location-dependent adaptive scale or 
multigrid approach to the MI inference of continuous-basis fields. 

2 Introduction: Surface representation 

2.1 Traditional methods 

Many methods for representing surfaces have been utilized previously, how- 
ever these methods involve representing the surface by a discrete basis field, 
perhaps with a deterministic interpolation defined (bi-linear, tensor B-splines, 
etc.) to provide a definition for the surface at points intermediate to the dis- 
crete field. Probability distributions or densities of these discrete fields then 
often take the form of normalized exponentials of sums of clique energy func- 
tions, and produce a construct commonly known as a Markov Random Field. 
(See Geman 0, for an often cited example.) There are several immediate 
observations on these approaches: 

• The surface remains unspecified at points intermediate to the discrete 
field, except by the often undefined notion of interpolation. 

• When interpolation is not defined, the discrete field probability dis- 
tribution says nothing about the probability distribution of surface at 
points intermediate to the discrete field points. 

• When interpolation is defined then, given a value of the discrete field, 
there is no uncertainty in the surface intermediate to the discrete field 
points. There is a deterministic mapping from any given discrete field to 
the corresponding continuous surface. In particular, when the discrete 
field basis covers a fixed grid on the (x, y) plane with z heights at each 
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grid point, known here as a height field, all sampling of the surface 
intermediate to the fixed grid is determined at the scale of the fixed 
grid. This is generally not physical, see next. 

• The surface distribution is not an intrinsic property of any physical sur- 
face, rather a post-hoc imposition of the analyst attempting a useful 
regularization. For instance, necessary scaling properties are ignored: 
Moving a camera closer to the surface, for example, so that the density 
of sample points on the physical surface increases, is not properly rep- 
resented in the fixed basis of the discrete field distribution; there is no 
consistency imposed that requires a subsampled set of points to have 
the same probability density that one would find by marginalizing the 
surface distribution over the sample points not in the subsampling. 

2.2 Scaling consistency 

The consistency condition mentioned in the last section, which must be im- 
posed on probability distributions for continuous fields is: 

Scaling of sample points consistency: For S C A indices of dis- 
crete field variables, 



Note that equation |1] is a condition which must be imposed on the distribu- 
tions which any modelling system learns where it is sensible to supersample 
or subsample the field arbitrarily, as in the continuous field basis case. 

2.3 Elements of tfie paradigm 

The rest of this paper discusses an approach to continuous field inference 
which corrects the deficiencies, including the intermediate value and scaling 
problems, of traditional discrete-basis approaches to the inference of discrete 
height fields, for example. The new approach is here named the Generalized 
Kalman Filter. 

There are four central objects of importance within the inference approach 
described in this paper, one of which is a new object to Bayesian inference: 
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• The prior distribution for field. The prior holds all information about 
fields before any data is observed. 

• The likelihood distribution. The likelihood is predictive for data, 
given the field. It incorporates all of the physics of the measurement 
process. 

• The posterior distribution. The posterior distribution summarizes 
everything knowable about the field given assumptions of likelihood 
form, the prior knowledge, and all data. 

• The knowledge-representation (KR) distribution. Within the usual 
Bayesian point of view, the KR distribution is the new mathematical 
object. In the paradigm described in this paper the KR distribution 
is the object updated when new data arrives. The KR distribution 
is parameterized by maximally informative statistics (see 0) for the 
learned field knowledge. Note that because the KR distribution has 
a finite number-of-values limitation, the KR distribution is not neces- 
sarily able to represent what could have been learned from data about 
the (continuous) field. Generally, the prior distribution and the KR 
distribution determine an approximation (possibly exact) to the field 
posterior distribution. It should be noted that modern computer ar- 
chitecture (memory and space-time) constraints appear to be the fun- 
damental physical drivers for the utilization of the KR distribution, 
simply because storing the exact posterior generally requires an infi- 
nite amount of memory. 

In the height field inference application discussed later the KR distri- 
bution is parameterized by heights at a set of discrete basis points, but 
holds knowledge about a continuous basis height field. However, gen- 
erally, the KR distribution may use an arbitrary set of basis functions. 

One advance of the GKF is that the KR distribution is naturally adap- 
tive in both dimension and scale, allowing the learning of continuous- 
basis field information at the appropriate scale, where appropriate. 

Benefits of the approach described in this paper are that it has these infor- 
mation theoretically optimal features: 1. A location-dependent adaptive and 
scalable multigrid-like algorithm, so that only the bytes necessary to repre- 
sent the learned information are stored, leading to a style of maximally sparse 
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representation of surface knowledge; 2. A recursive updating algorithm. It 
will become clear that the Bayesian GKF field inference paradigm also has 
these properties: 

• It is the information learned about the field, (the KR distribution), 
which takes the form of a distribution over discrete values. In the 
surface inference example these discrete values are heights at discrete 
basis points. 

• The prior distribution for fields, in conjunction with the learned knowl- 
edge of the field held within the KR distribution determine a well- 
defined posterior distribution over continuous fields. 

• The field posterior distribution is always a well defined quantity every- 
where. In the surface inference example discussed later, this continuity 
is at points intermediate to the discrete height field basis points of the 
KR distribution. 

• The scaling condition equation |l] is automatically imposed because the 
posterior distribution is a distribution over fields. 

As an example consider the inference of continuous surfaces: While it may 
seem obvious, in the case of continuous surface inference, that what one is 
actually representing with a discrete set of values in memory is only a part of 
the information which helps to determine the surface posterior distribution, it 
is unusual to not be discussing the height field as the primary representation 
of surface. It is the inherently discrete nature of the storage of information 
in machines which forces us into this stance - generally it is impossible to 
represent an arbitrary continuous field with a finite set of discrete values - 
one must also have another object from which to compute the intermediate 
values of the field. (Another way to look at the disparity between the current 
proposal for field inference and traditional proposals is that the traditional 
approaches are sufficient only for band-limited fields.) 

In section ^ the GKF is specialized to height fields, where an example, 
surface representation and learning, of the GKF paradigm is described. (The 
approach taken in this section is to specialize to a case that is then easily 
seen to generalize to the general continuous basis field inference paradigm.) 
The next section continues with observations on the update scheme. Further 
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sections continue with the example special case for surface distributions with 
particularly tractable mathematics, and final sections provide explicit forms 
for the general GKF equations, a discussion on their relationship to the 
standard Kalman filter, a discussion on the amount of information learned at 
each update, and a search heuristic. Extensive appendices provide supporting 
mathematics for the derivations. 

3 Surface representation and inference 

In this section the main ideas of the Bayesian surface representation and infer- 
ence paradigm presented in this paper are given. The technique is general, 
though: section ^ discusses the extension to an arbitrary-basis, arbitrary- 
dimension field. 

3.1 Surface distributions 

The surface and height field distributions (the prior, likelihood, and posterior 
surface and height field distributions) are discussed in this section. 

3.1.1 Surface and height field prior distributions 

Consider a set 5* of surfaces where each element s G is a height field, i.e. 
such that s = s{x,y) is real function of two variables. Write the prior prob- 
ability distribution for surfaces in S given the parameters 6 which determine 
the prior distribution as 



Consider a vector v = {vi, . . . , f„) of discrete (x, y) points, Vi = (xj, yi). For 
any given surface s denote the associated vector of heights by h{s, v) = 
{hi{s, v), . . . , hn{s, v)). Write the prior distribution of the surface heights at 
the chosen points v as P{h^ \ 6). This discrete height distribution may be 
found as follows: 




(2) 
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where the vector deha-function is defined as 

SiK-h{s,v)) = U'l^,6{K,i-h,{s,v)) (6) 

Now, given that what is known is the surface heights hy at a vector v of 
discrete {x, y) points, the posterior distribution of surfaces is found from 
Bayes' theorem as 

pis\K^9) = (7) 

(8) 
(9) 

where the denominator distribution was found in equation 

3.1.2 Measurements: The Likehhood 

In general, a surface s and some other parameters (p not dependent upon s 
(i.e. camera point spread function, camera position and direction, fighting 
position and direction, etc.) specify tfie probabifity distribution for data 
(fikefifiood) 

P{x\ s,(l),9) = P{x\ s,^) (10) 
wfiere tfie data distribution is independent of 9 once s is known. 

3.1.3 Conditioning on data: Surface and fieigfit field posterior distributions 
Given data, tfie surface posterior distribution is inferred using Bayes' tfieorem 
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The distribution of the surface posterior marginahzed to a set of discrete 
points may be written using equations IT^^H, doing steps similar to those 
taken in equations |^-^, as 

(13) 
(14) 
(15) 



F[n^ \ X, (p, 6) = 


/ rXJly S, X, (p, U) P[S X, (p, U) as 
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In steps similar to equations 
also known is given by 


the surface posterior when a height 


P{s hy,x,(j),9) = 
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P{K,x\ <p,d) 




p{K 1 s)P{x 1 s,0)p(s 1 e) 
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(16) 
(17) 

(18) 

where we used the facts that, given a surface, the data and the surface heights 
are independent, and the surface distribution is independent of the camera 
and lighting parameters 0. 



3.2 Approximating the posterior 

One motivation for approximating the surface distribution is that generally 
a surface is an uncountably infinite, continuous entity, and therefore there is 
little else which can be done to represent it exactly other than to go into, lit- 
erally, infinite detail (requiring an infinite supply of memory). It is therefore 
useful to have an approximation scheme which, although finite, captures the 
relevant information provided by data. Another excellent reason for develop- 
ing an approximation is mathematical tractability. Having a representation 
scheme which allows a tractable calculation of the posterior is a huge benefit 
for both computation and communication. Finally, it is of great interest to 
not waste computational resources while representing learned surface infor- 
mation. The solution to the surface representation problem presented here 
addresses the competition for representational resources (memory) issue in a 
unique manner. 
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3.2.1 The knowledge representation distribution 
The full posterior may be written in the form 

P{s \x,(f),e) = J P{s I h,, X, 0, e) P{K I X, 0, 6) dK (19) 

where the distributions inside the integral appear in equations [TB|-|TR The 



issue of generating a finite representation is not yet resolved via equation [19 
however, since storing information sufficient to determine the distributions 
P{s I x,(l),6), and P{s \ hy,x,(f),6) generally requires storing an infinite 
set of values in a finite amount of memory, or requires that all data be 
stored, disallowing any discarding of data and the incremental updating of 
the representation. Instead, consider the following approximation where the 
prior conditioned on a set of heights, along with a new distribution, the 
knowledge representation distribution P{hy \ x,(j),6), are substituted for the 



distributions inside the integral of equation 19 



p{s\ p{K\x,(j),9)) = p{s\K,e)P{K\x,(t),e)dK (20) 



It is important to note at this point that any suitable surface distribution 
may be substituted into the right-hand side of equation ^ for P{s \ h^, 6), 
since it is important only that the resulting integral be capable of making 
a good approximation to the true posterior. Further, it is not necessary to 
restrict the basis v to discrete height field basis points, any suitable basis may 
be taken, for instance Fourier components. Although all of the calculations 



of this paper are carried thru with the form of 20, other forms may prove 



more convenient, and it is not difficult to suggest others. In particular. 



since equation ^ will be used in an iterative update loop later, updates 
that take for the right-hand side prior term the last posterior term appear 
quite reasonable (the corresponding GKF update equations may be found 
immediately from those presented later). 

Although conditioning on the KR distribution P{h^, \ x, (p, 6) may seem 
strange, a good way to understand the meaning is that it is the KR distri- 
bution which is being used as a statistic for the learned surface information. 
The key thing to notice in equation ^ is that, with reasonable regularity 
conditions, choosing the points of v sufficiently dense, the approximation de- 
sired to the full posterior may become arbitrarily good. The trick will be to 



A Bayesian Reflection on Surfaces 



10 



choose V appropriately, properly weighting the competing need to approxi- 
mate arbitrarily well everywhere with the limited resources that are imposed 
when a finite amount of storage is available, i.e. when the dimensionality of 
V is fixed. This will be addressed in the next section. In the case of sim- 
ple imaging systems, the point spread function and pixel diameter are good 
indicators of the necessary sampling scale for v. In the super-resolved case, 
the resolution expected available from the data is the appropriate scale for 

V. 



The approximation to the posterior of has several properties which 
make it valuable: 

• The prior distribution P(s | h^,6) which supplies the uncertainties 
associated with points of the surface not in the vector v may be cho- 
sen to have a simple form (see appendix p.2.1|) that is easily encoded 
algorithmically in finite memory. 

• There is a clear separation between what was already known - the 
prior P{s \ hy,6), and what has been learned - the KR distribution 

• There is a clear description of the scale at which information has been 
acquired in terms of the density and uncertainties associated with the 
points {v,h{s,v)) on the surface, and in terms of the uncertainties of 
their positions as encoded in the KR distribution. 

In practice, it is useful to take a multinormal distribution over the discrete- 
point height field as the KR distribution. Let the parameterization of the 
KR distribution be O^,. For example, if the KR is taken to be multinormal 
then the parameters of that distribution are 

e,{x) = {fi,{x),j:,{x)), (21) 

the mean and covariance matrix of the multinormal, where the functional 
dependence on x indicates a data dependency through the update procedure, 
and the subscript v indicates that the parameters parameterize a distribution 
of heights at points v. Because the KR distribution and its parameters are 



related by a one-to-one mapping, re-write equation |20| as 



Pis I e„, e) = / P(s I h^, 9) P{K I B^) dK. (22) 
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In summary, we have arrived at an approximation to the surface posterior 
distribution, via the KR distribution, parameterized by 9„. 



3.3 Updating tlie knowledge representation 

Now we discuss updating 0^, when new data are acquired. Temporarily 
restrict attention to the fixed v case. During this and the next sections refer 
to figure 1 for a fiowchart of the general GKF update process. 



3.3.1 Bayes' theorem 

Having acquired 0" = 0„(a:"'), from previously seen data a?" = {xi, . . . , Xn) 
and upon seeing new data Xn+i, the goal is to find 0""*"^ such that the 
surface distribution given 0""'"^ approximates the surface distribution given 
Xn+i and 0". Given new data Xn+i in the context of the previously seen 
data x"' summarized by 0", our updated surface distribution is found via 
Bayes' theorem 
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(23) 



(24) 

The updated posterior P(s | Q^,Xn+i, (p,9) will be approximated by the 
0""*"^ parameterized KR distribution of equation |2^ as 

P{s\Q:+\9) = J P{s\K,9)P{K\Q:+')dK. (25) 

The approximation condition for determining 0""'"^ is then written 

Pis\Q:+\e) ^ P(s I 0^,0,0) (26) 
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Equation ^ suggests we try to minimize various measures of the closeness 
of the two distributions. For example, one measure is the average square 
difference of the two distributions, 

J \P,{s) - P2is)f ds (27) 

but there is (apparently) no good first-principles reason to use this form. 
In the next section we discuss the measure of distance which leads to the 
maximally informative choice of B^"*""^. 

3.3.2 Maximally informative inference 

The measure of distance which leads to the 0"^+^ providing the most infor- 
mation about the surface distribution is the maximally informative choice for 
the statistic 0"+^. The condition for being maximally informative, see 0, is 
that the KuUback-Leibler distance D{Pi{s), P2{s)) is minimized, where 

D{P,{s), P^is)) = J P,{s) log (^^^ ds (28) 

and where the P's above are posterior distributions of field, that is 

Pi(s) = P{s\x^,+,,Q:,(P,9) (29) 
P^is) = Pis\e:+\9). (30) 

That is. 

Find the 0"+^ such that 

^en+i / P{s I e:;, 0, 6) log ( ^i^l ^v,Xn+iA,0) \ 

(31) 

while at the 9"+^ satisfying the derivative condition above 



det 



d^.,. J Pis I e,,x^^„<P,9)log ds 



< 

(32) 
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i.e., the hessian is negative definite and the extremum is a local maximum. If 
possible, choose the global maximum. Note that the Kullback-Leibler distance 
is asymmetric. Generally, it is highly relevant which distribution contains 
the prior information and which distribution is being updated. Maximum 
entropy techniques reverse the roles of Pi and P2 which appear here. For a 
detailed explanation see 0. 

In the following section are some observations on the approach taken to 
maximally informative surface inference. Section ^ then briefly makes explicit 
the specific distribution forms which are assumed. The Generalized Kalman 
Filter update equations for the surface inference example which follow from 
this approach are then presented in section completing the derivation of 
the maximally informative approach. 

4 Observations on the update scheme 

Note the following: 

• The updating scheme described here is a maximally informative up- 
date scheme and is related to the Kalman filter. The Kalman filter 
is a minimum variance filtering scheme applicable in the case of fixed 
representation dimension. The crucial step which has been taken in 
the current work is the step of allowing the representation scheme to 
be adaptable. We have adopted the label "Generalized Kalman Filter" 
(GKF) to describe the idea represented here. The GKF equations are 
presented in section |[ 

• To this point we have only optimized over Bj,. It is clear that we may 
also vary the number of vertices l^l of the representation, allowing opti- 
mization over the number of vertices. Varying the number of vertices of 
the representation is absolutely necessary if surface knowledge at scales 
smaller than the current set of vertices represents is to ever accumulate. 
In section ^ the GKF update equations are derived assuming that the 
number of vertices in the representation basis vertex set is arbitrary at 
each update. 

• Beyond allowing the number of vertices to vary, the positions of the 
vertices may be allowed to vary. In section |^ the GKF update equations 
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are derived assuming that the representation basis vertex set positions 
are arbitrary. 

Detecting when and where new vertices are necessary is a matter of 



observing directly in equations 2S or 31 when new data produces a 



lower surface uncertainty over a region, and when having smaller un- 
certainty at neighboring vertices is not sufficient to represent this lower 
uncertainty over the region. 

The vertex representation for the surface knowledge is convenient, but 
not necessary. For example it is possible to extend a height field to 
a height-and-refiectance field or "arbitrary dimension field" , where the 
reflectance lies within a many- dimensional space. Reasonable struc- 
tures for the covariance matrix allow differing correlations between re- 
flectance values and between height values. It will be seen in in sec- 
tion 1^ that the GKF update equations are easily used in the "arbitrary 
dimension field" context. 

In its most abstract form, instead of having a "field" , there is simply 
a set of objects, while for each "object" there is an associated vector 
of properties, where some of the components of the property vector 
may be considered a location in space. In this fairly abstracted setting, 
the collection of objects has an associated joint probability distribu- 
tion which describes the probability distribution over configurations of 
objects. It will be seen in in section ^ that the GKF update equations 
are easily understood in the "object" context. 

Equation ^ which defines the quantity to be minimized is where a 
penalty term which indicates how many bits in hardware is available in 
trade for each bit of information learned from data. For example, one 
might penalize the KL distance by 1/lOth the number of bytes it takes 
to represent the new information gained by extending the number of 
points represented. The exact form of the information learned about 
the surface distribution contained in the KR distribution is found in 
section ^, where the dimensionality of the representation enters directly, 
and where bits-used penalty-terms may be introduced. 

The previous note points out how a minimum description length method 
fails for this problem. It is certainly the case that that our update 
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scheme may require much more memory (in bits) to represent the infor- 
mation learned than the information learned (in bits). At some point, 
if information at small enough scales is desired, MDL would truncate 
and stop. Clearly, applying MDL would then be a disaster. On the 
other hand, what seems to work here may be called an adaptive MDL 
approach. 

Note that a method like maximum entropy is entirely deficient for pro- 
viding distributions of surfaces: given the constraints implied by the 
knowledge of the distribution of the heights at discrete points: max- 
imum entropy ignores correlations between nearby surface points no 
matter how close, an entirely ludicrous situation. On the other hand, a 
method like relative maximum entropy, based on inverting the roles of 



the distributions in equation ^ claims to provide the least informative 
inference relative to the prior information, a heuristic, difficult to jus- 
tify, at best. Further, such approaches are typically based on likelihood 



distributions, rather than the posteriors that appear in equation ^ 



5 Surface Distribution Forms 

5. 1 Prior 

For simplicity of mathematical presentation only, the prior in our surface 
inference example is taken multinormal over continuous, smooth height fields. 
One particular, conveniently chosen, representation of the prior distribution 
is constructed in appendix p.2.1| . This prior may be written in the shorthand 



Pis\9) = Nitx,,^s)is) (33) 

where 6 = {/-ig, S^) is the parameter vector. The density of the height field 
determined by the prior 



P{K \e) = J P{K I s) P{s I 6) ds (34) 

= J 5{h^-h{s,v))P{s\e)ds (35) 
= iV(/x„S,)(^,) (36) 
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where 

= AygT^sA^g (37) 

and the projection onto the height field is given by A^s- Note that equation |3^ 
imphes that the surface density covariance is represented differently than a 
discrete surface distribution covariance matrix. Specifically, the projection 
matrix A^g is a delta-function-like operator, and Eg is a continuous function 
of two positions. In appendix |12.1| we show that the surface density has 



a compact continuous power spectrum representation, and there give the 



explicit form of that representation. Thus the notation of equation ^ must 
be considered a shorthand for the underlying continuous construct. 

5.2 Likelihood 

When measurement is modelled as a linear process corrupted by gaussian 
noise we have 

X = Ms + e 

e ~ A^(0,S,). (38) 

or 

P{x I s, 0) = N{Ms, E,)(a;) (39) 
where = (M, S^) is the parameter vector. 

6 The Generalized Kalman Filter equations. 

In this section a concise derivation of the Generalized Kalman Filter update 
equations specialized to the discrete basis multinormal KR distribution of 
equation ^ are derived. The updated KR need not have the same basis 
dimension nor position as the previous KR basis, solving the problem of how 
to allow updates from one representation to the next, same, finer or coarser, 
representation. 

Proceeding, the KR distribution in terms of the parameterized height 
field of equation ^ is 

P{s I e,", d) = J p{s I K, 9) p{K I e::) dK (40) 
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The distribution of surface given the height field from equation ^ is 

PiK I s)P{s I 6) 



P{s I h^t 



P{K I e) 

d{K - h{s,v))P{s I 9) 



p{K I e) 

Simphfy the integral of the KR distribution to find 



(41) 



Pis\0)JsiK-His,v))^^^dK 

Pis I e A^Ml^^^) (42) 



Note how the full surface distribution is simply modified by the ratio 

Pihis,v)\Q:) 
P{h{s,v)\e) 



(43) 



From equation ^ the Bayesian update of the KR distribution is 

p(s|x„+i,e:,0,e) -- 
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(44) 



Rewriting the updated distribution using equation ^ yields 

p{h{s,v) I 



P{s I e;:, 0, 9) cx P(a;„+i | s, 0) P{s \ 6) x 



(45) 



For maximally informative inference of the new KR we minimize, from equa- 
tion |2|, 

D(Pi(s), p^is)) = DiPis I e::, 0, 9), Pis I ^)) 

= y P(g|a;„+i,e„0,g)/o^^ P(s|e!!+\^1 j 



(46) 
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Note that it is not assumed here that v and v have the same dimension. Ex- 
panding the probabihty distributions within the logarithm appearing above 
yields 



Each term has the form of an information (or uncertainty). Together the 
six terms paint a descriptive picture of how information is acquired by the 
maximally informative update when taken as three groups of two terms: 
Denote by "new KR" the two terms with v and O^"*"^, by "previous KR" the 
two terms with v and 0^ and no data, and by "new data" the two terms 
with data dependency. Now, noting the signs on these quantities, because 
D is positive, the whole point of choosing a good 0"+^ approximation by 
minimizing D is that 



or in very rough terms we may see the update as capturing the sum-total of 
the available knowledge 

Total knowledge — Prior knowledge + New knowledge from data (49) 

Because only terms depending upon the update parameters v and 0^^^ are 
needed to perform the minimization, we drop the other terms at this point, 
and after making the multinormal substitutions for the distributions in the 
above we have 




x[-log{P{h{s,v)\e)) 
+log{P{h{s,v) I 9)) 
+log {P{xn+i I s, 0)) 



-iog(p{xn+i\e:,<i>,e)) 

+log {P(h(s, v) I 0^)) 

-log {Pih{s,v) \ Q!;,^'))] ds 



(47) 



Expected information in new KR ~ 

[Expected information in previous KR 
+ Expected information in new data) 



(48) 



D{P,{s),P,{s)) 



I 



p(/i^ix„+i,0:;,(/.,^)/o^(iv(M^,E. 



'V 



■)(M) dh, 



'V 
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P(/i^ I e:;, 0, 6) log (iV(^^+\ j dh^ 

(50) 



To simplify the P's appearing in equation the distribution of surface given 
old knowledge and new data, marginalized to the height field v, is useful, as 



is seen by observing equations ff^ and pO. Thus, consider 



oc iV(M(0)s,S^+i)(a;„+O 



X 



iV(/x„S,)(Ms,t;)) 



(51) 



found by making substitutions into ^ for the assumed distributions. Since 
it is not necessarily the case that Vi G {vj} or that Vi G {vj}. proceed by 
marginalizing to the union of the components of v and v, which we denote 
vUv, and then to the v components. Let At,uij,s denote the projection from 
i;^ to t; U V, Ay^yuy denote the projection from v Uv to v, and A^^y denote 
the projection from v to v. In performing the two projections (from Vg to 
V Uv, and then from v Uv to v) in order we find (not necessarily in most 
simple form), using results of appendices |12.2| - p^^75| , that 



where 



Pis I X 



n+l 



ds\v = N{fij„j:R){h^) 



(52) 



S^^ = Sq^ + (S^)-1-S^i 



(53) 



and where 



,.Q — A_ _A _ ,,P 

y^^ — A- -A - 4^ 



..n _ A ..n 



(54) 



(55) 
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y-1 _ A y-iAT 



y-i — A y-iAT 



V 



(56) 



(57) 



(58) 



Using the results of appendix \12.6[ the quantities of equation ^ above cor- 
respond to the values of the mean and standard deviation parameters of the 
new KR, found at the minimum Kullback Leibler distance, i.e. the mini- 
mization is immediately apparent from those results. Thus: 

on+l _ ( ,,n+l v^n+l\ 
— IA% 5 ) 

= S« (59) 



Equations ^are the Generalized Kalman Filter (GKF) update equations for 
the surface inference example, yet are quite a bit more general (the necessary 
change of variables needed when the forward projection is nonlinear appears 
in appendix |12.10| ). Having these update equations allows one to consider 
updating a representation of any dimension relative to the original represen- 
tation. Thus, knowledge may be represented in finer detail, corresponding 
to the old representation being contained in the new, knowledge may be 
represented in the same detail, corresponding to the case when the new rep- 
resentation is the same as the old representation, or knowledge may be tossed, 
corresponding to the case when the new representation does not contain the 
old representation. The maximally informative inference approach and its 
result of the Kullback Leibler distance on conditional posteriors led directly 
here to deriving the GKF and the solution of the problem of storing knowl- 
edge at scales adaptive to the actual needs of the data driving the update. 
The standard KF is discussed in \M. 
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7 Specializing the GKF 

When the surface of interest is itself a discrete height field, and the KR 
representation basis never changes in dimension nor position from that height 
field's basis, then all projections appearing in equations |5^ and following are 
identities, and the update equations simplify to the standard Kalman filter 
equations, in effect equations |5^ only, given suitable identification of the 
variables. 



8 Information learned 



Once a new set of parameters has been chosen, and for the purpose of eval- 
uating the new update in the context of other possible updates at different 
scales, using different representational bases, it is useful to have the quantity 
of information about the surface distribution that is contained in the KR 
at the maximally informative update. Using the results of appendix |12.6| in 
equation pOl we have this information, up to a constant, is given by 



''S^ + f/(/x^ - /X-)) ® Sri +%(|S^|) 



, Tr 

2 

- {Tr 
2 



(60) 



Note that the d^s (representation basis dimensions) from the dlog(27T)^s of 
equation ^ have cancelled. However the d^s remain hidden within the 
terms as matrix dimensions. When considering optimizing learned inorma- 
tion against storage resources, one must weigh a separate cost in bits for the 
memory used against the bits learned, the expression above. Note also, in- 
terestingly the expression above contains a BIC-like log{d) dependence term. 



9 Search for update parameters 

Now that we know what the update equations for the updating of the KR 
distribution look like, it is worthwhile considering how an updating scheme 
might be implemented to acquire information at the appropriate scale. First, 
we dismiss the notion that we will ever be using the continuous height field 
Vg (the support of s) at any time. None of the update equations force that 



A Bayesian Reflection on Surfaces 



22 



to happen! Second, since we have conchided that computationally Vg is 
a discrete set, and since there will always be pathological cases where the 
surface is much rougher than we care to represent, we acknowledge that fact 
and proceed by presenting a useful algorithm which allows the updating of 
the KR while maintaining the ability to explore a large range of scales. The 
following multigrid-style algorithm provides the general flavor: 

• Choose Vg denser by several orders of scale than the current represen- 
tation, and using other criteria associated with the knowledge of the 
data acquisition system (see below). 

• Choose V at regular scales intermediate between Vg and the old KR on 
V, compute the updates on all v chosen at these scales. 

• Compute the information learned at each scale. 

• Plot the information learned as a function of increasing density (de- 
creasing scale). 

• Choose, based on exploration of the plot, and costs associated with 
storing the learned information, whether to explore other octaves of 
scale. If Choose to explore, repeat above procedure. 

• If choice is to pick an informationally and storage attractive KR, do 
this and update the representation accordingly. 

In the surface reconstruction problem data often comes in the form of im- 
ages. The images may come from devices with vastly different resolutions, 
and the known parameters of pixel size, point spread function and geometry 
determine the appropriate reconstruction scale. Finally adapting the surface 
to resolve at sub-pixel scales requires a memory- aggressive approach which 
extends the exploration farther out on the learning curve towards smaller, 
denser representation scales. 

10 Conclusion 

Field inference has been generalized from the typical discrete fixed-basis set- 
ting to a continuous-basis setting. The problem of surface inference was 
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solved in the context of continuous field inference. Using the approach of ac- 
quiring the maximally informative KR distribution, the GKF equations were 
found. The GKF allows the updated KR parameters to be found at any 
scale and/or "positions" (abstractly, basis components). The approach al- 
lows the learning of information at the relevant scales desired. It provides an 
information-theoretic justification for location-dependent adaptive multi-grid 
inference. It also effectively provides similar justification for a scale-adaptive 
MDL method. This is apparently the first time that the maximally informa- 
tive inference of continuous-basis objects and the multigrid approach have 
been rigorously justified. 
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12 Appendices 

12.1 Construction of a 2D surface prior 

In this appendix we first introduce the reader to the fourier representation of a 
gaussian process, then using the notions developed find the representation for 
a 2D gaussian process over the plane, where the correlations of the process 
at points x and y are proportional to exp{—k \x — y\), k > 0, a simple 
translation-invariant choice for the form of the correlation structure of the 
probability density of surfaces having the plane as support. The utility for the 
GKF of having this process is that it serves as a simply computed algorithmic 
representation of the prior for surfaces having the plane as support. 
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12.1.1 The discrete gaussian process 

Consider f{n,c), n E = {— A^, . . . , — 1, 0, 1, . . . , A^}, a discrete process 
with expression as the fourier expansion 

N 

/(n,c)= E c.e^'" (61) 

k=-N 

where the coefficients c = (cfe) are constrained by / e so that Ck = cl^, 
and the n and k range over Zjv. Let the coefficients be random variables: 
Ck — Xk+iyk with Xk ~ -^(0, <Jk) and yk ~ N{0, Uk) both gaussian distributed 
random variables with mean and standard deviation ak- Now, dropping 
the fc's, the joint density of {x, y) is given by 

Pxv{x,y) = ^^ (62) 

Prom this the joint density of (r, 9) where r = and 9 — arctan(y/a;) 

is given by 

^^^^ 

The density of r is given directly by integrating over 9 



re 



Prir) = — (64) 



a 

while the density of 9 is given directly by integrating over r 

^e(^) = ^- (65) 

Making a change of variables, the density of cc* — x"^ -\- y"^ — r'^ is given by 
the exponential distribution 

P-«/2ct2 

Pce*{u) = ^^ (66) 

The distribution of Ck-\-C-k — 2i?e[cfe] = 2xfe, A; > is of interest because the 
process is real. 

g-n2/2(2(T)2 

Pc+Au) = (67) 
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which is just a gaussian with zero mean but twice the variance of the com- 



ponents X and y of c. Note that the actual coefficients in equation |6T 
CfcC*'^" + c_fce~*'^"' = 2i?e[cfce*'^"] also have the distribution of equation IB? 
since the phase of is uniformly distributed in [0, 27i]. 

Now, given a set of integers ( C we may ask for the density of the 
sampled values of the process / at ^ = (rii, n2, . . . , rim) 

/(C) = (/M,/K),...,/(M), (68) 
where m = \(\ ,ni E Z^, i = 1, . . . ,m. Define 

fiC, c) = {f{ni, c), f{n2, c), . . . , f{nm, c)) (69) 
Then the probability density function which describes the sampled values is 

PifiO) = J SifiC) - fiC, c)) P(c) dc (70) 

where 

N 

Pic) = Pico) n Pi^k + c_fc) (71) 

k=l 

Note that that the density of P(/(C)) is multivariate gaussian since the 
representation of /(C, c) as a fourier series shows that it is the sum of gaussian 
random vectors with components 2i?e[cfce*'^"]. The covariances of the process 
are found as 

E^,„ = E[/(m)/(n)] = E[fim)rin)] 

N 



E 



* i(km—ln) 

k,l=-N 



J2 Cfcc;* 



N 
k=-N 

= F[E[c,cl]]im - n) (72) 

where we used the fact that the coefficients of different frequency are uncor- 
related for k ^ I, i.e E[ckcJ] = for k ^ I. Define the power spectrum Rik) 
as 

Rik) = E[ckcl] (73) 
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Then we have that the covariance is given by the fourier transform of the 
power spectrum, 

Em,n = E[f{m)f{n)] = F[R]{m - n) = S^_„ (74) 

where we have acknowledged that the covariance structure is dependent only 
upon the difference m—n. From this we see that the inverse fourier transform 
of the covariance is the power spectrum, 

F-i [SJ (k) = R{k) (75) 



Finally, note that the density of CfcC^ given by equation |66| allows us to infer 
the parameters cxfc which are the standard deviations of the gaussian processes 
Xk and Uk underlying the coefficients Ck, since from equation 



E[ckcl]= u—^du = 2al (76) 

In the next section the basis for gaussian processes developed here is extended 
to the continuous 2D case to compute the power spectrum of a process spec- 
ified by a continuous-basis covariance structure. 

12.1.2 The continuous-basis 2D process 

Similar to the development in the last section, in two dimensions, given the 
continuous-basis covariance Hx = exp{—k \x\), k > 0., the power spectrum 
is found as the inverse fourier transform of the covariance, i.e. 

R{u = {u,v)) = F,-'[J:x]{u,v) 

e-^l(^'^)le-*"^e-^^J'da;rfi/ (77) 



Make the change of variables (x, y) —>■ (r, 6) so that x = rcos{6), y = rsin{6), 
then 



/•oo r2n 

R{u,v)= / e-'='^e-^"(""°'(^)+'''^"(^))rdrrf^ (78) 
JO Jo 

For simplicity, make the further change of variables {u, v) —>■ {s, cj)) so that 
u = scos{(j)), V = ssin^c/)), so that 



Jo 
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oo /•27r 
JO 

^g-fer / e-'^'^°'^^-^Ue dr 
Jo 

oo 

fcr 



R{s) = 271 re-^'^Jo{rs)dr (79) 
Jo 

Finally, 

Note that we have neglected the proportionality constant l/27r in the fourier 
transform, amounting to normalizing the delta function to 27r, and have 
scaled u to units of cycles per 2%. Note also that both the covariance of the 
process and the power spectrum scale with the same proportionality constant. 
Harmonic analysis is discussed in [Q] 

12.2 Multinormal density MGF 

The moment generating function for a probability distribution / is defined 
as the functional 

M[/](A) =£;y[e'^'^[^(^'*)l] (81) 

where U{y,z) is defined such that U = [Uij] and Uij{y,z) := yiZj, from 
which holds the property 

|^U.„^^.|.-.......J (82) 

ll • • • 2^ 

i.e the moments are found as derivatives of the MGF with respect to the 
parameter A at A = 0. 

Take the multinormal density function for x 

P{x I 0) = N{e){x) 

= N{fi,J:){x) 

= ^27rr/2\j:\i/2 ^^P(-lTr[U{x-f^)0E-']) (83) 
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where U{y) is defined such that Uij{y) := Uij{y,y) and d = Dim{x). The 
MGF of iV(e)(a;) is then given by 

M[N{e){x)]{X) = E[e^-[c^(^.a;)] I Q] 

= / (2^)d/2^| s ^,/2 ^M-lTr[U{x - m) ^ S-^] + Tr[U{X, x)]) dx 

(84) 

Minus twice the exponent of the integral above may be written as 

Tr[U{x- n)(^i:-^]-2Tr[U{X,x)] = Tr[f/(x - (/^ - A S)) ® S^^] 

+Tr[U{fx) ® S-i] 

-Tr[u{^^- xi:)®^-^] 
= Tr[u{x-{ti- xj:))®j:-^] 

-Tr[f/(A) ® S] 

-2Tr[f/(A,/x)] (85) 
from which the moment generating function is immediately found as 

M[N{e){x)]{X) = exp{Tr[U{fi,X)] + ^Tr[U{X)(^J:]) (86) 

From the above we have 

E[xi \Q]= fii 
E[{xi - fii){xj - fij) I 6] = T.ij (87) 

which agrees with the calculation of appendix |12.2 . Two things to note: 1. 
The inverse of S is assumed to exist. 2. All moments are determined by 
simple products and sums of the parameters {fx, S). 



12.3 Multinomial linear change of variables 

Letting y = Ax be the change of variables, where P{x \ 0) = N{Q){x), 
the MGF of the density P{y \ 0) is found from the MGF of the density for 
P{x I O) in a straightforward manner as 

M[P{y I 6)] (A) = E[e^'^[f^(A,2/)] | q] 
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= exp{ Tr[U{^i, A^X)] + ^rr[f/(A^A) ® S]) 
= exp( Tr[U{Afi, A)] + ^Tr[t/(A) ® (ASA^)]) 

(89) 

Note that the dropped subscripts x and x of the 6 and A are easily determined 
by the context, and that the density used to take the expectation naturally 



changed in equation ^ from P{y \ Q) to P{x \ G) without confusion. With 
this result and referring to equation ^ and preceding we find that the density 
for y is multinormal with 

fly = Aflx 

T.y = AT.^A^ (90) 

Note that everywhere the condition of A was neither mentioned nor assumed, 
thus A may be a rectangular matrix or otherwise not of full rank. 

12.4 Multinormal projections 

Another useful operation is that of projection onto a subset of the components 
of the argument of the multinormal distribution. Projections may be trivially 
represented as a linear operation, where the "projection matrix" is typically 
a rectangular matrix having the form of a unique (single) element of value 
1 in each row and column, zeroes elsewhere. Finding the distribution of the 
projected variables is equivalent to the operation of marginalizing over the 
components not in the projection. Let A be the projection matrix selecting a 



subset of the variables oi x as y = Ax. Then, using the result of section |12.3 
we immediately find integrals of the form 

j N{fi,T){x)dx\y = N{A^i, AT.A^) (y) (91) 

Both vector A/j, and the matrix AHA^ are now just appropriately rearranged 
pieces of the original vector fj, and matrix S. Specifically, if t/k = Xi^ then 



[ASA ]pq — Yi 
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12.5 Multinomial multiplication 

One operation which frequently occurs in Bayesian inference is that of taking 
the product of two multinomial distributions of the same variable and nor- 
malizing that product to find a new distribution. Finding the new B = {fi, E) 
amounts to completing the square, but it is useful to state the result, and we 
do this here. Let 61 = (/x^, Ei) and 9i = {^i^, Ei) be the parameters of the 
multinormal distributions in the product. Then 

/X = E(E];Vi + S^V2) 

E= (Er' + Er')-^ (92) 



12.6 Expected uncertainty in multinormals 



It is useful to know the expected uncertainty of one gaussian distribution in 
the context of another. Consider the quantity 

E[-log{P{Q2){x)) I 61] = - 1 iV(Mi, Ei)(aj) log {N{fi„ ^,){x)) dx (93) 

which occurs in similar form in the development of the Generalized Kalman 
Filter (section ||) and represents the expected uncertainty, or entropy, of the 
surface representation in the context of the updated surface distribution. The 
value of this integral is found straightforwardly using the results mentioned 
in appendix |12.2| as 



E[-log(N{fi,,i:2){^)) I e, 



2 



Tr[U{x - /X2) ® E 



-ll 



+^logi2n) + hogi\J:2\) 



-Tr 
2 



(El + f/(^i - /X2)) ® E 



-1 



+^logi2n) + hog{\J:2\) 



(94) 



12.7 Maximizing the expected information 

Varying E2, the minimum value of the uncertainty above occurs when B2 = 
01. That this is true for the ^t component of G2 is immediate from the 



A Bayesian Reflection on Surfaces 



31 



positive definite quadratic nature of the first term. For the S component 
the following fact following from the properties of determinants and matrix 
inverses facilitates the result: 



y-l 
^kl 



(95) 



12.8 Notes on matrix inverses and submatrices 

Given the invertible matrix V, composed in the following manner of subma- 
trices Vii, Vi2, V21, V22: 

V21 V22 

and its inverse 

V21 V22 

then it is immediate that the following relationships hold among the subma- 
trices 



A^ 



A-' 



(96) 
(97) 



" hi 


N12 ' 




_ N21 


I22 





V21V1I + T^22V'21 ^2lV^12 + 1^22 V^22 



(98) 



where / and N represent the identity and zero matrices respectively. Any 
quadratic operator x^Qx may be decomposed using projection matrices A 
and A where these are diagonal matrices with one and zero entries only, and 
where 

A + A = / (99) 

in the following manner 

x^Qx = x^{A + A)Q{A + Afx 

= x^aQaaXa + x^aQj^-xX^ + x^^j^xa + x^^x^ (100) 

Now, assume Q is symmetric and that both it and Qaa and Q-aa invert- 
ible, and rewrite this form as the sum of two terms as follows 



x^Qx = {xa - a)'^QAA{xA - cx) + C{xa) 

— XaQaaXa XaQaaX-a x-^Q-aaXa ~I~ oc Qaaoc + C(^x-a} 



(101) 
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where a = (Qaa) ^Qxa^a- Thus 

C{X^) = {QaA - QAAiQAA)-'QAA) ^A (102) 



Applying the identities of equation 98 



followed by 
find that 
so that 



QaaQaa + QaaQaa = ^AA (103) 

QaaQaa + QaaQaa ^ ^aa (104) 
Qaa - QAAiQAAr'QAA = {QaaY' (105) 



C{x^) = x^iQ^r'x^ (106) 

which immediately provides an alternate method for marginalizing gaussian 
distributions. 

12.9 Alternate inverse forms 

In the GKF update equations expressions for updating inverse matrices in 
terms of the sum of other inverse matrices occur. Because one of the sum- 
mand matrices may not be well-conditioned, it is of interest to find an expres- 
sion for the updated matrix in terms of the other matrices, which explicitly 
is not a function of the inverse matrices. Thus, let P, Q, R he invertible 
matrices such that 

p-i = Q-i + (107) 

Then we find 

P = Q-Q{Q + R)-^Q (108) 

by the following direct substitution 

PP-^ = {Q - Q{Q + R)-'Q){Q-' + R-') 

= I - Q [{Q + R)-\I + QR~') - R-' 

= I (109) 
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12.10 Nonlinear forward projection 

In the nonlinear forward projection case the projection is given by f{s), 
where /(■) is a nonhnear function of s rather then the hnear form Ms. 
Because the derivative of the forward projection is often a straightforward 
object to compute, expand /(s) about the mean of the old surface, Hg 

df 

x^f{f^,) + -^\^Js-tJ,,) + e (110) 

d f 

Letting M — Ijj,^ we have 

P{x\s,<i>) = N{{f{tx,)-Mfi,) + Ms,^,)ix) 
= iV(Ms,S,)(a;-(/(/xJ-MAiJ) 

(111) 

so that the appropriate changes to be made to the GKF update equations 
are simply 

^-ik (112) 



while everything else otherwise remains the same. 
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GKF Update Loop Equation 



n+1 



X 



Pis\ 0") 




P/ I r\n n + 1 
(^l , X ) 



0n , /-^n + 1 




Maxinfo Approx. 



The elements going into P {s\ 0") are the prior, restricted to some knowledge H about the 
field, P {s\ H) . (In the main text example,// is the set of known surface height field values.) 
and the Knowledge Representation (KR) distribution is P (// 0") , which is the learned 
knowledge about the specifics of the surface at the n'th iteration of the GKF. 

These form the approximate posterior P {s\ 0") given by the integral over H of the product of 
the KR distribution and the prior distribution given H known, that is 

P{s\@'') = \P{s\H)P{H\@")dH 

J (1) 

At update n+ I, the new data and the approximate posterior from iteration n are incorporated 
using the likelihood P ix"^^\s) and Bayes' theorem to produce the data-dependent posterior 
written P (s\ 0", x" ^ ) . Then, the new KR that caputres an approximation to this exact poste- 
rior using (1) above with n^n + I via Maximally informative statistical inference completes 
the GKF loop. 

Figure 1 - Generalized Kalman Filter Update Loop 



